Search CORE

472 research outputs found

Filling the gap between biology and computer science

Author: Aguilar-Ruiz Jesús S
Moore Jason H
Ritchie Marylyn D
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

This editorial introduces BioData Mining, a new journal which publishes research articles related to advances in computational methods and techniques for the extraction of useful knowledge from heterogeneous biological data. We outline the aims and scope of the journal, introduce the publishing model and describe the open peer review policy, which fosters interaction within the research community

Springer - Publisher Connector

PubMed Central

GPNN: Power Studies and Applications of a Neural Network Method for Detecting Gene-Gene Interactions in Studies of Human Disease

Author: Lee Stephen L
Mellick George
Motsinger Alison A
Ritchie Marylyn D
Publication venue: Dartmouth Digital Commons
Publication date: 25/01/2006
Field of study

The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson\u27s disease

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Power of grammatical evolution neural networks to detect gene-gene interactions in the presence of error

Author: Davis Anna C
Fanelli Theresa J
Motsinger-Reif Alison A
Ritchie Marylyn D
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background With the advent of increasingly efficient means to obtain genetic information, a great insurgence of data has resulted, leading to the need for methods for analyzing this data beyond that of traditional parametric statistical approaches. Recently we introduced Grammatical Evolution Neural Network (GENN), a machine-learning approach to detect gene-gene or gene-environment interactions, also known as epistasis, in high dimensional genetic epidemiological data. GENN has been shown to be highly successful in a range of simulated data, but the impact of error common to real data is unknown. In the current study, we examine the power of GENN to detect interesting interactions in the presence of noise due to genotyping error, missing data, phenocopy, and genetic heterogeneity. Additionally, we compare the performance of GENN to that of another computational method – Multifactor Dimensionality Reduction (MDR). Findings GENN is extremely robust to missing data and genotyping error. Phenocopy in a dataset reduces the power of both GENN and MDR. GENN is reasonably robust to genetic heterogeneity and find that in some cases GENN has substantially higher power than MDR to detect functional loci in the presence of genetic heterogeneity. Conclusion GENN is a promising method to detect gene-gene interaction, even in the presence of common types of error found in real data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data

Author: Buchanan Carrie C
Bush William S
Ritchie Marylyn D
Torstenson Eric S
Publication venue: BMJ Group
Publication date
Field of study

Crossref

PubMed Central

Alternative contingency table measures improve the power and detection of multifactor dimensionality reduction

Author: Bush William S
Dudek Scott M
Edwards Todd L
McKinney Brett A
Ritchie Marylyn D
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Multifactor Dimensionality Reduction (MDR) has been introduced previously as a non-parametric statistical method for detecting gene-gene interactions. MDR performs a dimensional reduction by assigning multi-locus genotypes to either high- or low-risk groups and measuring the percentage of cases and controls incorrectly labelled by this classification – the classification error. The combination of variables that produces the lowest classification error is selected as the best or most fit model. The correctly and incorrectly labelled cases and controls can be expressed as a two-way contingency table. We sought to improve the ability of MDR to detect gene-gene interactions by replacing classification error with a different measure to score model quality. Results In this study, we compare the detection and power of MDR using a variety of measures for two-way contingency table analysis. We simulated 40 genetic models, varying the number of disease loci in the model (2 – 5), allele frequencies of the disease loci (.2/.8 or .4/.6) and the broad-sense heritability of the model (.05 – .3). Overall, detection using NMI was 65.36% across all models, and specific detection was 59.4% versus detection using classification error at 62% and specific detection was 52.2%. Conclusion Of the 10 measures evaluated, the likelihood ratio and normalized mutual information (NMI) are measures that consistently improve the detection and power of MDR in simulated data over using classification error. These measures also reduce the inclusion of spurious variables in a multi-locus model. Thus, MDR, which has already been demonstrated as a powerful tool for detecting gene-gene interactions, can be improved with the use of alternative fitness functions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases

Author: Hahn Lance W
Moore Jason H
Parker Joel S
Ritchie Marylyn D
White Bill C
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: Appropriate definition of neural network architecture prior to data analysis is crucial for successful data mining. This can be challenging when the underlying model of the data is unknown. The goal of this study was to determine whether optimizing neural network architecture using genetic programming as a machine learning strategy would improve the ability of neural networks to model and detect nonlinear interactions among genes in studies of common human diseases. RESULTS: Using simulated data, we show that a genetic programming optimized neural network approach is able to model gene-gene interactions as well as a traditional back propagation neural network. Furthermore, the genetic programming optimized neural network is better than the traditional back propagation neural network approach in terms of predictive ability and power to detect gene-gene interactions when non-functional polymorphisms are present. CONCLUSION: This study suggests that a machine learning strategy for optimizing neural network architecture may be preferable to traditional trial-and-error approaches for the identification and characterization of gene-gene interactions in common, complex human diseases

Springer - Publisher Connector

PubMed Central

Carolina Digital Repository

Genomic analyses with biofilter 2.0: knowledge driven filtering, annotation, and model development

Author: Alex Frase
Carrie Moore
Daniel Wolfe
John Wallace
Marylyn D Ritchie
Neerja Katiyar
Sarah A Pendergrass
Publication venue: Springer Nature
Publication date: 30/12/2013
Field of study

BACKGROUND: The ever-growing wealth of biological information available through multiple comprehensive database repositories can be leveraged for advanced analysis of data. We have now extensively revised and updated the multi-purpose software tool Biofilter that allows researchers to annotate and/or filter data as well as generate gene-gene interaction models based on existing biological knowledge. Biofilter now has the Library of Knowledge Integration (LOKI), for accessing and integrating existing comprehensive database information, including more flexibility for how ambiguity of gene identifiers are handled. We have also updated the way importance scores for interaction models are generated. In addition, Biofilter 2.0 now works with a range of types and formats of data, including single nucleotide polymorphism (SNP) identifiers, rare variant identifiers, base pair positions, gene symbols, genetic regions, and copy number variant (CNV) location information. RESULTS: Biofilter provides a convenient single interface for accessing multiple publicly available human genetic data sources that have been compiled in the supporting database of LOKI. Information within LOKI includes genomic locations of SNPs and genes, as well as known relationships among genes and proteins such as interaction pairs, pathways and ontological categories. Via Biofilter 2.0 researchers can: • Annotate genomic location or region based data, such as results from association studies, or CNV analyses, with relevant biological knowledge for deeper interpretation • Filter genomic location or region based data on biological criteria, such as filtering a series SNPs to retain only SNPs present in specific genes within specific pathways of interest • Generate Predictive Models for gene-gene, SNP-SNP, or CNV-CNV interactions based on biological information, with priority for models to be tested based on biological relevance, thus narrowing the search space and reducing multiple hypothesis-testing. CONCLUSIONS: Biofilter is a software tool that provides a flexible way to use the ever-expanding expert biological knowledge that exists to direct filtering, annotation, and complex predictive model development for elucidating the etiology of complex phenotypic outcomes

Springer - Publisher Connector

PubMed Central

Recommended from our members

Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico

Author: Basile Anna O.
Pendergrass Sarah A.
Ritchie Marylyn D.
Zhang Xinyuan
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Background The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. Results We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. Conclusions Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses

Columbia University Academic Commons

Directory of Open Access Journals